Browser Use — Autonomous Browser Automation Two complementary tools for browser automation: Tool Best for How it works agent-browser Step-by-step control, scraping, form filling CLI commands, you drive each action browser-use Complex autonomous tasks Python agent that decides actions itself Quick Start agent-browser (recommended for most tasks)

Navigate and inspect

agent-browser open "https://example.com" agent-browser snapshot -i

Get interactive elements with @refs

Interact using refs

agent-browser click @e3

Click element

agent-browser fill @e2 "text"

Fill input (clears first)

agent-browser press Enter

Press key

Extract data

agent-browser get text @e1

Get element text

agent-browser get attr @e1 href

Get attribute

agent-browser screenshot /tmp/p.png

Screenshot

Done

agent-browser close browser-use (autonomous agent)

Run a full autonomous browsing task

browser-use-agent "Find the pricing for Notion and compare plans" The agent will navigate, click, read pages, and return a structured result. agent-browser — Full Reference Navigation agent-browser open < url

Navigate to URL

agent-browser back

Go back

agent-browser forward

Go forward

agent-browser reload

Reload page

agent-browser close

Close browser

Snapshot (page analysis) agent-browser snapshot

Full accessibility tree

agent-browser snapshot -i

Interactive elements only (recommended)

agent-browser snapshot -c

Compact output

agent-browser snapshot -d 3

Limit depth to 3

agent-browser snapshot -s "#main"

Scope to CSS selector

agent-browser snapshot -i --json

JSON output for parsing

Interactions (use @refs from snapshot) agent-browser click @e1

Click

agent-browser dblclick @e1

Double-click

agent-browser fill @e2 "text"

Clear and type (use this for inputs)

agent-browser type @e2 "text"

Type without clearing

agent-browser press Enter

Press key

agent-browser press Control+a

Key combination

agent-browser hover @e1

Hover

agent-browser check @e1

Check checkbox

agent-browser uncheck @e1

Uncheck checkbox

agent-browser select @e1 "value"

agent-browser scroll down 500

Scroll page

agent-browser scrollintoview @e1

Scroll element into view

agent-browser drag @e1 @e2

Drag and drop

agent-browser upload @e1 file.pdf

Upload files

Extract Data agent-browser get text @e1

Get element text

agent-browser get html @e1

Get innerHTML

agent-browser get value @e1

Get input value

agent-browser get attr @e1 href

Get attribute

agent-browser get title

Page title

agent-browser get url

Current URL

agent-browser get count ".item"

Count matching elements

Wait agent-browser wait @e1

Wait for element

agent-browser wait 2000

Wait milliseconds

agent-browser wait --text "Done"

Wait for text to appear

agent-browser wait --url "/dash"

Wait for URL pattern

agent-browser wait --load networkidle

Wait for network idle

Screenshots, PDF & Recording agent-browser screenshot path.png

Save screenshot

agent-browser screenshot --full

Full page screenshot

agent-browser pdf output.pdf

Save as PDF

agent-browser record start ./demo.webm

Start recording

agent-browser record stop

Stop and save

Sessions (parallel browsers) agent-browser --session s1 open "https://site1.com" agent-browser --session s2 open "https://site2.com" agent-browser session list State (persist auth/cookies) agent-browser state save auth.json

Save session (cookies, storage)

agent-browser state load auth.json

Restore session

Cookies & Storage agent-browser cookies

Get all cookies

agent-browser cookies set name value

agent-browser cookies clear

Clear cookies

agent-browser storage local

Get all localStorage

agent-browser storage local set k v

Set value

Tabs & Frames agent-browser tab

List tabs

agent-browser tab new [ url ]

New tab

agent-browser tab 2

Switch to tab

agent-browser frame "#iframe"

Switch to iframe

agent-browser frame main

Back to main frame

Browser Settings agent-browser set viewport 1920 1080 agent-browser set device "iPhone 14" agent-browser set geo 37.7749 -122.4194 agent-browser set offline on agent-browser set media dark JavaScript agent-browser eval "document.title"

Run JS in page context

browser-use — Autonomous Agent For complex tasks where you want the agent to figure out the browsing steps: browser-use-agent "Your task description here" Custom Script (advanced)

Run via: /opt/browser-use/bin/python3 script.py

import asyncio , os from browser_use import Agent , Browser from langchain_anthropic import ChatAnthropic async def run ( ) : browser = Browser ( ) llm = ChatAnthropic ( model = 'claude-sonnet-4-20250514' , api_key = os . environ [ 'ANTHROPIC_API_KEY' ] ) agent = Agent ( task = "Compare pricing on 3 competitor sites" , llm = llm , browser = browser , ) result = await agent . run ( max_steps = 15 ) await browser . close ( ) return result asyncio . run ( run ( ) ) You can swap the LLM for any langchain-compatible model (OpenAI, Anthropic, etc). Standard Workflow

1. Open page

agent-browser open "https://example.com"

2. Snapshot to see what's on the page

agent-browser snapshot -i

3. Interact with elements using @refs from snapshot

agent-browser fill @e1 "search query" agent-browser click @e2

4. Wait for new page to load

agent-browser wait --load networkidle

agent-browser snapshot -i

6. Extract what you need

agent-browser get text @e5

7. Close when done

agent-browser close

Important Rules

Always

snapshot -i

after navigation

— refs change on every page load

Use

fill

not

type

for inputs — fill clears existing text first

Wait after clicks that trigger navigation

—

wait --load networkidle

Close the browser when done

—

agent-browser close

Google/Bing block headless browsers

(CAPTCHA) — use DuckDuckGo or

web_search

instead

Save auth state

for sites requiring login —

state save/load

Use

--json

when you need machine-parseable output

Use sessions

for parallel browsing —

--session

Troubleshooting

Element not found

Re-run

snapshot -i

to get current refs

Page not loaded

Add

wait --load networkidle

after navigation

CAPTCHA on search engines

Use DuckDuckGo or the

web_search

tool instead

Auth expired

Re-login and
state save
again
Display errors: The install script sets up Xvfb for headless rendering

安装

Navigate and inspect

Get interactive elements with @refs

Interact using refs

Click element

Fill input (clears first)

Press key

Extract data

Get element text

Get attribute

Screenshot

Done

Run a full autonomous browsing task

Navigate to URL

Go back

Go forward

Reload page

Close browser

Full accessibility tree

Interactive elements only (recommended)

Compact output

Limit depth to 3

Scope to CSS selector

JSON output for parsing

Click

Double-click

Clear and type (use this for inputs)

Type without clearing

Press key

Key combination

Hover

Check checkbox

Uncheck checkbox

Select dropdown option

Scroll page

Scroll element into view

Drag and drop

Upload files

Get element text

Get innerHTML

Get input value

Get attribute

Page title

Current URL

Count matching elements

Wait for element

Wait milliseconds

Wait for text to appear

Wait for URL pattern

Wait for network idle

Save screenshot

Full page screenshot

Save as PDF

Start recording

Stop and save

Save session (cookies, storage)

Restore session

Get all cookies

Set cookie

Clear cookies

Get all localStorage

Set value

List tabs

New tab

Switch to tab

Switch to iframe

Back to main frame

Run JS in page context

Run via: /opt/browser-use/bin/python3 script.py

1. Open page

2. Snapshot to see what's on the page

3. Interact with elements using @refs from snapshot

4. Wait for new page to load

5. Re-snapshot (refs change after navigation!)

6. Extract what you need

7. Close when done